220 research outputs found

    Experiment & Modellierung in der Systembiologie

    Get PDF

    BKM-react, an integrated biochemical reaction database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The systematic, complete and correct reconstruction of genome-scale metabolic networks or metabolic pathways is one of the most challenging tasks in systems biology research. An essential requirement is the access to the complete biochemical knowledge - especially on the biochemical reactions. This knowledge is extracted from the scientific literature and collected in biological databases. Since the available databases differ in the number of biochemical reactions and the annotation of the reactions, an integrated knowledge resource would be of great value.</p> <p>Results</p> <p>We developed a comprehensive non-redundant reaction database containing known enzyme-catalyzed and spontaneous reactions. Currently, it comprises 18,172 unique biochemical reactions. As source databases the biochemical databases <it>BRENDA</it>, <it>KEGG</it>, and <it>MetaCyc </it>were used. Reactions of these databases were matched and integrated by aligning substrates and products. For the latter a two-step comparison using their structures (<it>via InChIs</it>) and names was performed. Each biochemical reaction given as a reaction equation occurring in at least one of the databases was included.</p> <p>Conclusions</p> <p>An integrated non-redundant reaction database has been developed and is made available to users. The database can significantly facilitate and accelerate the construction of accurate biochemical models.</p

    KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed.</p> <p>Description</p> <p>Here we present a text mining algorithm for the extraction of kinetic information such as K<sub>M</sub>, K<sub>i</sub>, k<sub>cat </sub>etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (K<sub>M</sub>, K<sub>i</sub>, k<sub>cat</sub>, k<sub>cat</sub>/K<sub>M</sub>, V<sub>max</sub>, IC<sub>50</sub>, S<sub>0.5</sub>, K<sub>d</sub>, K<sub>a</sub>, t<sub>1/2</sub>, pI, n<sub>H</sub>, specific activity, V<sub>max</sub>/K<sub>M</sub>) from about 17 million PubMed abstracts and combine them with other data in the abstract.</p> <p>A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched.</p> <p>The results were stored in a database and are available as "KID the KInetic Database" via the internet.</p> <p>Conclusions</p> <p>The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases.</p> <p>The database is available at <url>http://kid.tu-bs.de</url>. The source code of the algorithm is provided under the GNU General Public Licence and available on request from the author.</p

    mSpecs: a software tool for the administration and editing of mass spectral libraries in the field of metabolomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metabolome analysis with GC/MS has meanwhile been established as one of the "omics" techniques. Compound identification is done by comparison of the MS data with compound libraries. Mass spectral libraries in the field of metabolomics ought to connect the relevant mass traces of the metabolites to other relevant data, e.g. formulas, chemical structures, identification numbers to other databases etc. Since existing solutions are either commercial and therefore only available for certain instruments or not capable of storing such information, there is need to provide a software tool for the management of such data.</p> <p>Results</p> <p>Here we present mSpecs, an open source software tool to manage mass spectral data in the field of metabolomics. It provides editing of mass spectra and virtually any associated information, automatic calculation of formulas and masses and is extensible by scripts. The graphical user interface is capable of common techniques such as copy/paste, undo/redo and drag and drop. It owns import and export filters for the major public file formats in order to provide compatibility to commercial instruments.</p> <p>Conclusion</p> <p>mSpecs is a versatile tool for the management and editing of mass spectral libraries in the field of metabolomics. Beyond that it provides capabilities for the automatic management of libraries though its scripting functionality. mSpecs can be used on all major platforms and is licensed under the GNU General Public License and available at <url>http://mspecs.tu-bs.de</url>.</p

    Development of a classification scheme for disease-related enzyme information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>BRENDA (<b>BR</b>aunschweig <b>EN</b>zyme <b>DA</b>tabase, <url>http://www.brenda-enzymes.org</url>) is a major resource for enzyme related information. First and foremost, it provides data which are manually curated from the primary literature. DRENDA (<b>D</b>isease <b>RE</b>lated <b>EN</b>zyme information <b>DA</b>tabase) complements BRENDA with a focus on the automatic search and categorization of enzyme and disease related information from title and abstracts of primary publications. In a two-step procedure DRENDA makes use of text mining and machine learning methods.</p> <p>Results</p> <p>Currently enzyme and disease related references are biannually updated as part of the standard BRENDA update. 910,897 relations of EC-numbers and diseases were extracted from titles or abstracts and are included in the second release in 2010. The enzyme and disease entity recognition has been successfully enhanced by a further relation classification via machine learning. The classification step has been evaluated by a 5-fold cross validation and achieves an F1 score between 0.802 ± 0.032 and 0.738 ± 0.033 depending on the categories and pre-processing procedures. In the eventual DRENDA content every category reaches a classification specificity of at least 96.7% and a precision that ranges from 86-98% in the highest confidence level, and 64-83% for the smallest confidence level associated with higher recall.</p> <p>Conclusions</p> <p>The DRENDA processing chain analyses PubMed, locates references with disease-related information on enzymes and categorises their focus according to the categories <b><it>causal interaction</it></b>, <b><it>therapeutic application</it></b>, <b><it>diagnostic usage </it></b>and <b><it>ongoing research</it></b>. The categorisation gives an impression on the focus of the located references. Thus, the relation categorisation can facilitate orientation within the rapidly growing number of references with impact on diseases and enzymes. The DRENDA information is available as additional information in BRENDA.</p

    CUPSAT: prediction of protein stability upon point mutations

    Get PDF
    CUPSAT (Cologne University Protein Stability Analysis Tool) is a web tool to analyse and predict protein stability changes upon point mutations (single amino acid mutations). This program uses structural environment specific atom potentials and torsion angle potentials to predict ΔΔG, the difference in free energy of unfolding between wild-type and mutant proteins. It requires the protein structure in Protein Data Bank format and the location of the residue to be mutated. The output consists information about mutation site, its structural features (solvent accessibility, secondary structure and torsion angles), and comprehensive information about changes in protein stability for 19 possible substitutions of a specific amino acid mutation. Additionally, it also analyses the ability of the mutated amino acids to adapt the observed torsion angles. Results were tested on 1538 mutations from thermal denaturation and 1603 mutations from chemical denaturation experiments. Several validation tests (split-sample, jack-knife and k-fold) were carried out to ensure the reliability, accuracy and transferability of the prediction method that gives >80% prediction accuracy for most of these validation tests. Thus, the program serves as a valuable tool for the analysis of protein design and stability. The tool is accessible from the link

    Computational modeling of protein mutant stability: analysis and optimization of statistical potentials and structural features reveal insights into prediction model development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding and predicting protein stability upon point mutations has wide-spread importance in molecular biology. Several prediction models have been developed in the past with various algorithms. Statistical potentials are one of the widely used algorithms for the prediction of changes in stability upon point mutations. Although the methods provide flexibility and the capability to develop an accurate and reliable prediction model, it can be achieved only by the right selection of the structural factors and optimization of their parameters for the statistical potentials. In this work, we have selected five atom classification systems and compared their efficiency for the development of amino acid atom potentials. Additionally, torsion angle potentials have been optimized to include the orientation of amino acids in such a way that altered backbone conformation in different secondary structural regions can be included for the prediction model. This study also elaborates the importance of classifying the mutations according to their solvent accessibility and secondary structure specificity. The prediction efficiency has been calculated individually for the mutations in different secondary structural regions and compared.</p> <p>Results</p> <p>Results show that, in addition to using an advanced atom description, stepwise regression and selection of atoms are necessary to avoid the redundancy in atom distribution and improve the reliability of the prediction model validation. Comparing to other atom classification models, Melo-Feytmans model shows better prediction efficiency by giving a high correlation of 0.85 between experimental and theoretical ΔΔG with 84.06% of the mutations correctly predicted out of 1538 mutations. The theoretical ΔΔG values for the mutations in partially buried <it>β</it>-strands generated by the structural training dataset from PISCES gave a correlation of 0.84 without performing the Gaussian apodization of the torsion angle distribution. After the Gaussian apodization, the correlation increased to 0.92 and prediction accuracy increased from 80% to 88.89% respectively.</p> <p>Conclusion</p> <p>These findings were useful for the optimization of the Melo-Feytmans atom classification system and implementing them to develop the statistical potentials. It was also significant that the prediction efficiency of mutations in the partially buried <it>β</it>-strands improves with the help of Gaussian apodization of the torsion angle distribution. All these comparisons and optimization techniques demonstrate their advantages as well as the restrictions for the development of the prediction model. These findings will be quite helpful not only for the protein stability prediction, but also for various structure solutions in future.</p
    corecore